148 research outputs found

    Accounting for horizontal gene transfers explains conflicting hypotheses regarding the position of aquificales in the phylogeny of Bacteria

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Despite a large agreement between ribosomal RNA and concatenated protein phylogenies, the phylogenetic tree of the bacterial domain remains uncertain in its deepest nodes. For instance, the position of the hyperthermophilic Aquificales is debated, as their commonly observed position close to Thermotogales may proceed from horizontal gene transfers, long branch attraction or compositional biases, and may not represent vertical descent. Indeed, another view, based on the analysis of rare genomic changes, places Aquificales close to epsilon-Proteobacteria.</p> <p>Results</p> <p>To get a whole genome view of <it>Aquifex </it>relationships, all trees containing sequences from <it>Aquifex </it>in the HOGENOM database were surveyed. This study revealed that <it>Aquifex </it>is most often found as a neighbour to Thermotogales. Moreover, informational genes, which appeared to be less often transferred to the <it>Aquifex </it>lineage than non-informational genes, most often placed Aquificales close to Thermotogales. To ensure these results did not come from long branch attraction or compositional artefacts, a subset of carefully chosen proteins from a wide range of bacterial species was selected for further scrutiny. Among these genes, two phylogenetic hypotheses were found to be significantly more likely than the others: the most likely hypothesis placed Aquificales as a neighbour to Thermotogales, and the second one with epsilon-Proteobacteria. We characterized the genes that supported each of these two hypotheses, and found that differences in rates of evolution or in amino-acid compositions could not explain the presence of two incongruent phylogenetic signals in the alignment. Instead, evidence for a large Horizontal Gene Transfer between Aquificales and epsilon-Proteobacteria was found.</p> <p>Conclusion</p> <p>Methods based on concatenated informational proteins and methods based on character cladistics led to different conclusions regarding the position of Aquificales because this lineage has undergone many horizontal gene transfers. However, if a tree of vertical descent can be reconstructed for Bacteria, our results suggest Aquificales should be placed close to Thermotogales.</p

    Detecting lateral gene transfers by statistical reconciliation of phylogenetic forests

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>To understand the evolutionary role of Lateral Gene Transfer (LGT), accurate methods are needed to identify transferred genes and infer their timing of acquisition. Phylogenetic methods are particularly promising for this purpose, but the reconciliation of a gene tree with a reference (species) tree is computationally hard. In addition, the application of these methods to real data raises the problem of sorting out real and artifactual phylogenetic conflict.</p> <p>Results</p> <p>We present Prunier, a new method for phylogenetic detection of LGT based on the search for a maximum statistical agreement forest (MSAF) between a gene tree and a reference tree. The program is flexible as it can use any definition of "agreement" among trees. We evaluate the performance of Prunier and two other programs (EEEP and RIATA-HGT) for their ability to detect transferred genes in realistic simulations where gene trees are reconstructed from sequences. Prunier proposes a single scenario that compares to the other methods in terms of sensitivity, but shows higher specificity. We show that LGT scenarios carry a strong signal about the position of the root of the species tree and could be used to identify the direction of evolutionary time on the species tree. We use Prunier on a biological dataset of 23 universal proteins and discuss their suitability for inferring the tree of life.</p> <p>Conclusions</p> <p>The ability of Prunier to take into account branch support in the process of reconciliation allows a gain in complexity, in comparison to EEEP, and in accuracy in comparison to RIATA-HGT. Prunier's greedy algorithm proposes a single scenario of LGT for a gene family, but its quality always compares to the best solutions provided by the other algorithms. When the root position is uncertain in the species tree, Prunier is able to infer a scenario per root at a limited additional computational cost and can easily run on large datasets.</p> <p>Prunier is implemented in C++, using the Bio++ library and the phylogeny program Treefinder. It is available at: <url>http://pbil.univ-lyon1.fr/software/prunier</url></p

    A Branch-Heterogeneous Model of Protein Evolution for Efficient Inference of Ancestral Sequences

    Get PDF
    International audienceMost models of nucleotide or amino acid substitution used in phylogenetic studies assume that the evolutionary process has been homogeneous across lineages and that composition of nucleotides or amino acids has remained the same throughout the tree. These oversimplified assumptions are refuted by the observation that compositional variability characterizes extant biological sequences. Branch-heterogeneous models of protein evolution that account for compositional variability have been developed, but are not yet in common use because of the large number of parameters required, leading to high computational costs and potential overparameterization. Here, we present a new branch-nonhomogeneous and nonstationary model of protein evolution that captures more accurately the high complexity of sequence evolution. This model, henceforth called Correspondence and likelihood analysis (COaLA), makes use of a correspondence analysis to reduce the number of parameters to be optimized through maximum likelihood, focusing on most of the compositional variation observed in the data. The model was thoroughly tested on both simulated and biological data sets to show its high performance in terms of data fitting and CPU time. COaLA efficiently estimates ancestral amino acid frequencies and sequences, making it relevant for studies aiming at reconstructing and resurrecting ancestral amino acid sequences. Finally, we applied COaLA on a concatenate of universal amino acid sequences to confirm previous results obtained with a nonhomogeneous Bayesian model regarding the early pattern of adaptation to optimal growth temperature, supporting the mesophilic nature of the Last Universal Common Ancestor

    Databases of homologous gene families for comparative genomics

    Get PDF
    International audienceBackground: Comparative genomics is a central step in many sequence analysis studies, from gene annotation and the identification of new functional regions in genomes, to the study of evolutionary processes at the molecular level (speciation, single gene or whole genome duplications, etc.) and phylogenetics. In that context, databases providing users high quality homologous families and sequence alignments as well as phylogenetic trees based on state of the art algorithms are becoming indispensable. Methods: We developed an automated procedure allowing massive all-against-all similarity searches, gene clustering, multiple alignments computation, and phylogenetic trees construction and reconciliation. The application of this procedure to a very large set of sequences is possible through parallel computing on a large computer cluster. Results: Three databases were developed using this procedure: HOVERGEN, HOGENOM and HOMOLENS. These databases share the same architecture but differ in their content. HOVERGEN contains sequences from vertebrates, HOGENOM is mainly devoted to completely sequenced microbial organisms, and HOMOLENS is devoted to metazoan genomes from Ensembl. Access to the databases is provided through Web query forms, a general retrieval system and a client-server graphical interface. The later can be used to perform tree-pattern based searches allowing, among other uses, to retrieve sets of orthologous genes. The three databases, as well as the software required to build and query them, can be used or downloaded from the PBIL (PĂ´le Bioinformatique Lyonnais) site at http://pbil.univ-lyon1.fr/

    Extreme halophilic archaea derive from two distinct methanogen Class II lineages

    Get PDF
    International audiencePhylogenetic analyses of conserved core genes have disentangled most of the ancient relationships in Archaea. However, some groups remain debated, like the DPANN, a deep-branching super-phylum composed of nanosized archaea with reduced genomes. Among these, the Nanohaloarchaea require high-salt concentrations for growth. Their discovery in 2012 was significant because they represent, together with Halobacteria (a Class belonging to Euryarchaeota), the only two described lineages of extreme halophilic archaea. The phylogenetic position of Nanohaloarchaea is highly debated, being alternatively proposed as the sister-lineage of Halobacteria or a member of the DPANN super-phylum. Pinpointing the phylogenetic position of extreme halophilic archaea is important to improve our knowledge of the deep evolutionary history of Archaea and the molecular adaptive processes and evolutionary paths that allowed their emergence. Using comparative genomic approaches, we identified 258 markers carrying a reliable phylogenetic signal. By combining strategies limiting the impact of biases on phylogenetic inference, we showed that Nanohaloarchaea and Halobacteria represent two independent lines that derived from two distinct but related methanogen Class II lineages. This implies that adaptation to high salinity emerged twice independently in Archaea and indicates that emergence of Nanohaloarchaea within DPANN in previous studies is likely the consequence of a tree reconstruction artifact, challenging the existence of this super-phylum

    Codon contexts in enterobacterial and coliphage genes

    No full text
    This investigation of the codon context of enterobacteria, plasmid, and phage protein genes was based on a search for correlations between the presence of one base type at codon position III and the presence of another base type at some other position in adjacent codons. Enterobacterial genes were compared with eukaryotic sequences for codon context effects. In enterobacterial genes, base usage at codon position III is correlated with the third position of the upstream adjacent codon and with all three positions of the downstream codon. Plasmid genes are free of context biases. Phage genes are heterogeneous: MS2 codons have no biased context, whereas lambda genes partly follow the trends of the host bacterium, and T7 genes have biased codon contexts that differ from those of the host. It has been reported that two successive third-codon positions tend to be occupied by two purines or two pyrim-idines in Escherichia coli genes of low expression level. Here, the extent to which highly expressed protein genes can modulate base usage at two successive codon positions III, given the constraints on codon usage and protein sequence that act on them, was quantified. This demonstrates that the above-mentioned favored patterns are not a characteristic of weakly expressed genes but occur in all genes in which codon context can vary appreciably. The correlation between successive third-codon positions is a distinct feature of enterobacteria and of some phages, one that may result from adaptation of gene structure to translational efficiency. Conversely, codon context in yeast and human genes is biased-but for reasons unrelated to translation

    Interfacing similarity search software with the sequence retrieval system ACNUC

    No full text
    International audienceA method of interfacing sequence similarity search software with the fast sequence retrieval system ACNUC is described. The method is written in FORTRAN 77 and is straightforward to implement because no textprocessing code is required — a minimum of 12 extra lines of FORTRAN provided the interface for most applications. The method is also efficient, since sequences are located by simple indexing techniques, with no linear searches of large database files necessary

    Adaptation to Environmental Temperature Is a Major Determinant of Molecular Evolutionary Rates in Archaea

    No full text
    Methods to infer the ancestral conditions of life are commonly based on geological and paleontological analyses. Recently, several studies used genome sequences to gain information about past ecological conditions taking advantage of the property that the G+C and amino acid contents of bacterial and archaeal ribosomal DNA genes and proteins, respectively, are strongly influenced by the environmental temperature. The adaptation to optimal growth temperature (OGT) since the Last Universal Common Ancestor (LUCA) over the universal tree of life was examined, and it was concluded that LUCA was likely to have been a mesophilic organism and that a parallel adaptation to high temperature occurred independently along the two lineages leading to the ancestors of Bacteria on one side and of Archaea and Eukarya on the other side. Here, we focus on Archaea to gain a precise view of the adaptation to OGT over time in this domain. It has been often proposed on the basis of indirect evidence that the last archaeal common ancestor was a hyperthermophilic organism. Moreover, many results showed the influence of environmental temperature on the evolutionary dynamics of archaeal genomes: Thermophilic organisms generally display lower evolutionary rates than mesophiles. However, to our knowledge, no study tried to explain the differences of evolutionary rates for the entire archaeal domain and to investigate the evolution of substitution rates over time. A comprehensive archaeal phylogeny and a non homogeneous model of the molecular evolutionary process allowed us to estimate ancestral base and amino acid compositions and OGTs at each internal node of the archaeal phylogenetic tree. The last archaeal common ancestor is predicted to have been hyperthermophilic and adaptations to cooler environments can be observed for extant mesophilic species. Furthermore, mesophilic species present both long branches and high variation of nucleotide and amino acid compositions since the last archaeal common ancestor. The increase of substitution rates observed in mesophilic lineages along all their branches can be interpreted as an ongoing adaptation to colder temperatures and to new metabolisms. We conclude that environmental temperature is a major factor that governs evolutionary rates in Archaea

    WWW-query: an on-line retrieval system for biological sequence banks

    No full text
    We have developed a World Wide Web (WWW) version of the sequence retrieval system Query: WWW-Query. This server allows to query nucleotide sequence banks in the EMBL/GenBank/DDBJ formats and protein sequence banks in the NBRF/PIR format. WWW-Query includes all the features of the on-line sequence browsers already available: possibility to build complex queries, integration of cross-references with different data banks, and access to the functional zones of biological interest. It also provides original services not available elsewhere: introduction of the notion of re-usable sequence lists, integration of dedicated helper applications for visualizing alignments and phylogenetic trees and links with multivariate methods for studying codon usage or for complementing phylogenies
    • …
    corecore